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Abstract 

The purpose of this study is to investigate association of data collectors' differences with the differences in 
reliability and validity of scores regarding affective variables (motivation toward science learning and science 
attitude) that are measured by Likert-type scales. Four researchers trained in data collection and seven science 
teachers who did not undergo any training, gathered data from 391 ninth-grade students. The data collection 
instruments were the "Motivation toward Science Learning Scale” and "Science Attitude Scale.” Data collection 
applications were conducted in four stages, two of which were accomplished four weeks apart by the research¬ 
ers. The remaining two stages were accomplished four weeks apart by the teachers. A principal component 
analysis, confirmatory factor analysis, Cronbach’s alpha reliability analysis, Pearson correlation test for con¬ 
vergent validity, and t-test for the differences between the mean scores of each data collection stage were used 
for the data analysis. The results showed that motivation toward science learning and attitude toward science 
were high but the factor structures and reliability values, which were obtained by different data collectors, were 
different for the two scales. As another result, the convergent validity between the scores on the scales was 
shown to be sufficient for the measurements. However, the results of difference tests on the mean scores of 
the applications showed that there was a statistically significant difference between the mean scores of the two 
motivation scale applications by the teachers. 


Keywords 

Data Collector, Motivation toward Learning Science, Science Attitude, Validity, Reliability. 


In science education literature, Likert-type scales are 
frequently used for data collection, but researchers 
prefer different data collectors when they carry out 
research using one type of scale. Although the same 


scale is used in different studies, the use of different 
data collectors might make an important difference 
in the research results (Fraenkel 8c Wallen, 2003). 
The differences arising from data collectors are 
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an important factor threatening internal validity 
in research (Fraenkel & Wallen, 2003). Therefore, 
data collector characteristics become an important 
factor in the data collection process (Fraenkel 8t 
Wallen, 2003; Miyazaki & Taylor, 2008). The scale 
implementation process includes procedures to 
take this into account and requires expertise. In this 
process, the implementers try to properly proceed 
using handbooks about the scale (Brener, McManus, 
Galuska, Lowry, & Wechsler, 2003). Undergoing 
training (or not) is an important component of data 
collection, but some of the studies in the field of 
science education do not give any information about 
data collectors (Akpinar, Aktami§, & Ergin, 2005; 
Gomleksiz & Bulut, 2006; Yildiz, Akpinar, Aydogdu, 
& Ergin, 2006). Probably, data are frequently 
collected by teachers. However, how to develop and 
apply a scale for research is not taught to pre-service 
science teachers who are working toward their 
bachelors degree at Turkish universities. In spite of 
the need for data collection to solve the problems 
in Turkeys educational system, there is no strong 
training course in line with this purpose. Turkey 
is among the least successful countries in the PISA 
examination (The Organisation for Economic Co¬ 
operation and Development, 2009), indicating a 
need to collect more data about where the problem 
lies. To meet this need, it is necessary to check the 
data collection process that use Likert scales for the 
data collector effect. 

Although insufficient information on data collector 
characteristics is reported in papers, the differences 
among data collectors in terms of whether or not 
they have received training might change the 
reliability and validity of the scores collected by 
Likert scale applications. For example, Rogers 
(1976) stated that task- or individual-oriented 
data collection processes make a difference in 
consistency in data collection. Reliability and 
validity are characteristics of scores obtained from 
a scale and are two factors that have an effect on 
the quality of inference after the measurement 
(American Educational Research Association, 
1999; Del Greco, Walop, & McCarthy, 1987). 
Discrepancies originating from the data collector 
can lead to differences in the values of reliability 
and validity, thereby negatively influencing the 
accuracy of inferences based on measurements. The 
importance of this problem in terms of obtaining 
results in survey research using Likert-type scales in 
science education sets the framework of this study. 
Thus, the problem is examined by investigating the 
reliability and validity of measurements regarding 
two affective variables (i.e., motivation toward 


science learning and science attitude) that are 
measured using Likert scales in science education. 

In education literature, motivation and attitude 
are frequently researched affective factors (Bong, 
2001; Dede & Yaman, 2008; Douglas, 2006; 
Kahyaoglu, 2013; Koballa & Glynn, 2007; Oguz 
(^akir, 2011; Osborne, Simon, & Collins, 2003; 
Pintrich, 1999; Pintrich & DeGroot, 1990; Savran 
8c (^akiroglu, 2001; Serin, 2009; Simpson, Koballa, 
Oliver, 8c Crawley, 1994; Temiz, 2010; Wigfield 
8c Eccles, 2000; Yenice, Saydam, 8c Telli, 2012) 
that are measured with Likert-type scales (Qava§, 
2011; Dede 8c Yaman, 2008; Savran 8c Qakiroglu, 
2001; Tuan, Chin, 8c Shieh, 2005; Yilmaz 8c (Java§ 
Huyuguzel, 2007; Yumu§ak, Sungur, 8c Qakiroglu, 
2007). Motivation is an affective characteristic 
that is effective on acting for reaching a purpose 
(Brophy, 1998). For research on motivation in 
science education, “Students’ Motivation toward 
Science Learning (SMTSL)” developed by Tuan 
et al. (2005) is an important scale because it has 
been applied to large samples and has high values 
of reliability and validity. Moreover, this scale was 
adapted to Turkish by Yilmaz and Qava§ Huyuguzel 
(2007). On the other hand, the “Science Attitude 
Scale (SAS)” developed by Geban, Ertepinar, 
Yilmaz, Atlan, and §ahpaz (1994) is another Likert- 
type scale used frequently in Turkey (Bilgin 8c 
Karaduman, 2005; Qava§, 2011; Kenar 8c Balci, 2012; 
Ozyilmaz 8c Hamurcu, 2005; Tatar 8c Kuru, 2009; 
Unal 8c Ergin, 2006). Both of these affective focused 
studies present information about the reliability 
and validity values, but no information is given 
about data collectors. Consequently, investigating 
the possible effect of data collector differences on 
validity and reliability is an important contribution 
for current science education studies and future 
studies that will use Likert-type scales. 

The purpose of this study is to investigate how data 
collector differences are reflected in the reliability 
and validity of scores regarding affective variables 
(motivation toward science learning and science 
attitude) that are measured by Likert scales. 

Method 

In this study, reliability and validity values of the 
data gathered by different data collector groups 
were investigated by utilizing a survey approach 
(Karasar, 1999; Wallen & Fraenkel, 2001). The data 
were collected from 391 (184 female, 107 male) 
ninth-grade Anatolian high school students. The 
data collectors were four researchers (2 female, 2 
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Figure 1: Data collection applications. 


male) and seven science teachers (1 female, 6 male). 
The researchers received a two-week training 
on how to apply the scale (two hours per week). 
The training content consisted of introducing the 
research subject of the scale application, explaining 
the purpose of the application, stating possible 
advantages and disadvantages of the application, 
and explaining ethical subjects, dress style, and use 
of language. However, the teachers applied the scales 
without undergoing any training. In this study, the 
applications of the four data collection processes 
were conducted separately. These applications 
included two beginning scale applications and two 
scale applications implemented four weeks later. 
Figure 1 shows the model of the scale applications. 

The data collection instruments were the SMTSL and 
SAS. The SMTSL, the original of the first scale, was 
developed by Tuan et al. (2005). Then the scale was 
adapted into Turkish by Yilmaz and Qava§ Huyugiizel 
(2007). The Turkish version of the scale consists of six 
factors (self-efficacy, active learning strategies, science 
learning value, performance goal, achievement goal, 
and learning environment stimulation) and includes 
33 items. The result of the reliability analysis of the 
scores showed that Cronbachs alpha values of the 
factors were between .54 and .85; on the other hand, 
the reliability analysis of the total scores on the scale 
was .87. Two examples of the scale items are “When 
I find the science content difficult, I do not try to 
learn it” and “In science, I think that it is important 
to learn to solve problems.” The SAS, the second 
scale, was developed by Geban et al. (1994). Ba§er 
(1996) reported that the SAS included 15 items and 
had one factor. In addition, Cronbachs alpha value 
of the scores on the scale was .83. Two example 
items in the scale are “I am bored when I study 
science subjects” and “I want to learn more about 
science subjects.” Confirmatory and explanatory 
factor analyses ( principal component analysis and 
varimax rotation) for construct validity, Cronbachs 
alpha reliability analysis, Pearson correlation test for 
convergent validity, and a f-test for the differences 
between mean scores of each data collection stage 



were used for the data analysis. For the f-test analysis, 
Bonferroni correction was done, and the alpha value 
was determined as .006. AMOS and SPSS 18 package 
programs were used for all analyses. 

Findings 

The findings of this study are presented under three 
main headings: construct validity and reliability, 
convergent validity, and a f-test for the differences 
between the mean scores in each application. 

Construct Validity and Reliability 

Findings Regarding Construct Validity and 
Reliability (SMTSL): The confirmatory factor 
analysis results for each application indicated 
that although (X 2 /sd) was between 1.58 and 2.28, 
the other indexes for each application were not 
acceptable for the proposed factor model (GFI: 
.62-.71; CFI: .64-.77; RMSEA: .08-. 11) (Hoyle, 
2000; Marsh, Balia, & McDonald, 1988; Marsh & 
Hocevar, 1988; Raykov & Marcoulides, 2006). On 
the other hand, (X 2 /sd) and RMSEA indexes showed 
differences between the trained and untrained data 
collectors in terms of focused variables. 

Because of the confirmatory analysis results, an 
explanatory factor analysis was carried out. Before the 
analysis, the Kaiser-Meyer-Olkin (KMO) measure of 
sample adequacy and Barlett’s test of sphericity values 
were calculated. The results (KMO > .60, p < .05) 
showed that the data were suitable for factor analysis 
(Sharma, 1996; Tav§ancil, 2002). According to the 
principal component analysis, the scores collected 
by different data collectors revealed different factor 
structures and also explained that total variances 
are different for each application. The item loading 
values and loaded factors for each application were 
also different for the same instrument. The reliability 
results showed that the reliability value of each factor 
was quite different and between .34 and .86. The total 
reliability value for each data collector group was .82 
and .89, respectively. 
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Findings Regarding Construct Validity and 
Reliability (SAS): The confirmatory factor analysis 
results for each application indicated that except 
for “X 2 /sd” (1.40-3.45) and CFI value (.83-.95), 
the other indexes for each application are not 
acceptable for the proposed factor model (GFI: 
.77-.82; RMSEA: .08-. 14) (Hoyle, 2000; Marsh, 
Balia, &McDonald, 1988; Marsh & Hocevar, 1988; 
Raykov & Marcoulides, 2006). 

After the confirmatory factor analysis, it was 
decided to carry out an explanatory factor analysis. 
According to the KMO measure of sample adequacy 
and Barletts test of sphericity values (KMO > .60, 
p < .05), the data were suitable for factor analysis 
(Sharma, 1996; Tav§ancil, 2002). When the results of 
the principal component analysis were investigated, 
similar results with SMTSL were found, and also, 
the omitted items after analyses were not common 
for each application. The reliability of each factor 
was quite different, ranging from .60 to .92. The 
total reliability value for each data collector was .88 
and .92, respectively. 

Convergent Validity 

The convergent validity was examined by 
investigating correlations between the scores on the 
motivation and attitude scales. The results indicated 
that there was a statistically significant and positive 
relationship between the scores of motivation and 
attitude for each application (r = .56-.66, p < .05). 
These results showed the expected results in terms 
of convergent validity (Singh, Graville, & Dika, 
2002; Tuan et al., 2005). 

T-Test for the Differences between the Mean 
Scores of Each Application 

The results of difference tests between the mean 
scores of the applications showed that there was 
not any statistically significant difference between 
the mean scores of the two motivation and attitude 
scale applications by the researchers (f SMTSL =1.66, p 
> .006; f SAS = 0.45, p > .006). Therefore, there was no 
practical importance of the results in terms of effect 
sizes (Coe, 2002). 

The non-significant difference between the mean 
scores of the two attitude scale applications by 
teachers was also determined (f = 0.51, p > 
.006). However, there was a statistically significant 
difference between the mean scores of the two 
motivation scale applications by the teachers (Z 
= 3.15, p < .006). Consequently, there was not 


any statistically significant difference between the 
mean scores of the researchers’ and teachers’ first 
and second motivation, as well as attitude scale 
applications (first application: f SMTSL = .39, p > .006; 
f SAS = 1-09, p > .006; second application: f SMTSL = 
.2.59, p > .006; f SAS = 0.95, p > .006). 

Discussion and Suggestions 

This study found that reliability and validity values 
differed significantly across the data collection 
applications. According to the confirmatory and 
explanatory factor analyses, factor structures, 
items loadings in the factors, and index values 
differed between the two applications conducted 
four weeks apart by the researcher and teacher. 
These differences may arise from differences in the 
data collectors’ characteristics despite their having 
had the same training (Fraenkel & Wallen, 2003). 
Especially, definite differences were seen among 
the applications of the teacher data collectors. For 
instance, an important difference is statistically 
significant between the two applications by the 
teachers. These findings show that motivation data 
obtained by teachers yielded two different results 
when collected at different times. Consequently, 
it can be speculated that this is a reflection of 
differences in the data collectors for construct 
validity and data stability of SMTSL. Total reliability 
values regarding the motivation scales were similar 
between the data of researchers and teachers. 
However, the important point is that the reliability 
of different factors cannot be compared because the 
factors do not share a common structure. 

A look at the second variable of this study shows 
that attitude scores are different in terms of 
factor structures and reliability values. Attitude 
is suggested as a susceptible affective variable of 
data collector characteristics by Pol and Ponzurick 
(1989). Therefore, the findings of this study also 
supported Pol and Ponzurick’s suggestion. These 
findings mean that motivation, similar to attitude, 
is susceptible to data collector characteristics. 
Also, these findings supported previous studies 
by Eryilmaz (2002), Behi and Nolan (1996), and 
Miyazaki and Taylor (2008), who explained that 
training on the data collection process, experience 
in data collection, gender, race, and age were 
important factors in explaining differences in the 
data collected by various data collectors. Moreover, 
Sondergeld and Johnson (2014) emphasized that 
factor structures in scales may differ depending 
on the sample, and this creates difficulties in the 
comparison of different study results. Thus, it can 
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be considered that researchers’ ignorance of data 
collector differences threatens the reliability and 
validity of data obtained from Likert-type scales. 
An important way to reduce these differences is to 
train the data collectors, but this study’s findings 
showed that training status alone is not sufficient to 
provide strong reliability and validity. 

The other finding regarding convergent validity 
supported the literature in terms of the relationship 
between motivation and attitude (Singh et al., 2002; 
Tuan et al., 2005). In all of the applications, there 


was a statistically significant relationship between 
motivation and attitude. Therefore, this result showed 
that the measurements had convergent validity. 

Based on the findings of this study, it was suggested 
that data collector characteristics should be taken 
into account when Likert-type instruments are 
used to collect data on motivation and attitude. At 
the same time, other affective variables such as self- 
efficacy and anxiety should also be examined using 
a similar approach. 
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